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CN Abstract 
Q^ 

^ Our goal in tliis article is to review the known properties of the mysterious Kolakoski sequence 

and at the same time look at generalizations of it over arbitrary two letter alphabets. Our 
primary focus will here be the case where one of the letters is odd while the other is even, 
since in the other cases the sequences in question can be rewritten as (well-known) primitive 
substitution sequences. We will look at word and letter frequencies, squares, palindromes 
and complexity. 

, 1. Introduction 

^ A one-sided infinite sequence z over the alphabet A= {1, 2} is called a (classical) Kolakoski 

\Q sequence, if it equals the sequence defined by its run- lengths, i.e.: 

O 

^ " ^i^ ^i^ •■■ 

g 2211212212... = z. 

2 Here, a run is a maximal subword consisting of identical letters. The sequence z' = Iz is 

the only other sequence which has this property. 

• ^ 

^ This sequence was introduced by Kolakoski in [22] who asked "What is the nth term? Is 

the sequence periodic?" ^ This sequence has attracted attention over the years since, although 
it is easy to define, it resists any attempt to reveal even some of its most basic properties 
like recurrence or the frequency of its letters. There is even some prize money offered for 
answering some of these question about its properties, see [20, 21]. The maybe most basic 
question is known as Keane's question [19]: 

Does the frequency of the symbol 1 in z = 221121 . . . exist, and is it equal to ^? 

The line of attack in trying to prove this question has often been to detect some structure 
by rewriting the generation rule of the Kolakoski sequence in some sort of generalized sub- 



^ The first question is still studied today, see [32] and [15]. In these articles, recursive formulae for the 
nth term are derived thus answering the first question. 



stitution rule, see for example [31, 14], [28, Section 4.4] and references therein. However, 
these attempts have not been successful in answering Keane's question. 



Our goal in this article is more humble - we want to give an overview of (the little) what is 
known about the Kolakoski sequence but at the same time look at generalizations to arbitrary 
two- letter alphabets A = {r, s} where r and s are natural numbers (with r ^ s). We note 
that "10" or "439" in this generalization is one letter not two or three, and we can (well, if 
we really want to) examine the Kolakoski sequence(s) over the alphabet A = {10,439}. 

If we do this generalization, then we find some easy cases for which we can answer Keane's 
question immediately. For this we use the observation made in [10]: One can obtain the 
Kolakoski sequence z above by starting with 2 as a seed and iterating the two substitutions 

1 1 1 H> 2 

^0- 2 ^ 11, ""i - 2 ^ 22 

alternatingly, i.e., ctq substitutes letters on even positions and ai letters on odd positions: 

2 ^ 22 ^ 2211 ^ 221121 ^ 221121221 ^ . . . 

Clearly, the iterates converge to the Kolakoski sequence z (in the obvious product topology), 
and z is the unique (one-sided) fixed point of this iteration. 

Similarly, a (generalized) Kolakoski sequence over an alphabet A = {r,s}, which is 
again also equal to the sequence of its run-lengths, can be obtained by iterating the two 
substitutions 

r I— > J. gr 

CTn '■ , and (Ji : 

^ s I— )■ r s s 

alternatingly. Here, a* denotes a run of b as, i.e., a'' = a . . .a {b times). 

Let us now assume that both r and s are even number. Building blocks of two letters 
A = rr and B = ss and applying the alternating substitution rule to them, one actually 
obtains a usual substitution rule for A and B: 

A ^ A'^B"' 
^ B ^ A^'B^ 

where m = ^ and n = ^. In fact, from this (primitive) substitution rule it is easy to see that 
the frequency of the letters r and s in the original sequence must be equal, see [29, 30]. 

Let us now assume that both r and s are odd numbers. Again, building blocks of two 
letters helps, although we need three such blocks here: A = rr, B = rs and C = ss. For 
these three letters one again obatins a usual (primitive) substitution rule: 

A ^ A'^BC"' 
a: B ^ A'^BC (1) 
C ^ A^'BC'^ 
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where m = and n = From this representation it is straightforward to calculate the 
letter frequencies in the corresponding Kolakoski sequence. However, here the frequencies of 
r and s are not equal^, see [1, 29]. 

We will therefore look at the generalizations of the Kolakoski sequence in this article 
where one of the letters in the alphabet is odd while the other is even. We will not look at 
generalizations to three-letter alphabets (see for example [2]) since there the situation is in 
general'' certainly worse than for two-letter (where we only alternate between two letters). 



2. Derivatives and Primitives 

Broadly speaking, there are (currently) two approaches to study a Kolakoski sequence: Either 
one tries to examine the set of all (infinite) sequences over A = {r, s} with the property that 
their run- length sequence is also a sequence over the same alphabet A= {r,s} (and the run- 
length sequence of the run-length sequence - and so on - is also a sequence over A = {r, s}). 
Or, one tries to study the set of all possible (finite) subwords (or factors) of the Kolakoski 
sequence. This leads to the study of so-called C°°-words. We will introduce C°°-words in 
this and the next section, and will show how the former approach via sequences is used in 
Section 4. 

We start with some basic definitions. Let A be an alphabet, which throughout this article 
will always be a two-letter alphabet A = {r, s} where r, s G N. Then z G A^ is a (one-sided 
infinite) sequence of letters in A. Any w = W1W2 ■ ■ - Wn G A^ where n G N is a word of 
length n and we use the notation \w\ = n to denote the length of w. We denote the empty 
word by e. Furthermore, we use the notation \w\r and \w\s for the number of rs and ss in 
the word w, and, moreover, \w\y for the number of occurences of the word v in the word w. 

Since we are working in a two-letter alphabet, we can define the following two properties: 
Let ~ be the operation that exchanges letters, i.e., f = s and s = r extended to any word 
w = W1W2 ■ ■ - Wn/^y w = W1W2 . . .Wn- Then a sequence z is called mirror invariant if 

w occurs in z <^=^ w occurs in z. 

Similarly, the operation ^ denotes the reversed word w= WnWn-i ■ ■ ■ W2W1 of a word w = 
W1W2 ■ ■ ■ Wn-iWn, and we say that a sequence z is reversal invariant if 

w occurs in z <^=^ w occurs in z. 

^ One can show that the substitution in (1) is a Pisot substitution with cubic Pisot-Vijaraghavan number 
if 2(r + s) > (r — s)^. It is a unimodular Pisot substitution if r — s ± 2. In the case 2(r + s) < (r — s)^, all 
roots of the corresponding substitution matrix are greater than 1 in modulus (and cubic algebraic numbers) . 
A formula for the letter frequencies in the case that one of the odd numbers is 1 can be found in [4] . 

^ Of course, there are also simple cases where we can rewrite everything using one substitution rule: If 
the three letters are equal modulo 3, building blocks of three letters is the key. At least, if we alternate the 
three letters periodically in the original sequence. 
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Let us have a look back at those Kolakoski sequences where either r and s are both odd 
or both even: 

• If r and s are both even, the corresponding Kolakoski sequence is not mirror invariant: 
If the sequence starts with r, then a run of rs is always followed by an unequally long 
run of ss, but a run of ss is followed by a run of ss of length r or s. E.g., the sequence 
z = 2244222244442 ... has the subword 4444224 but not 2222442. 

This example also shows that such a Kolakoski sequence is not reversal invariant (e.g., 
2442222 does not appear in the previous z). 

• If r and s are both odd, the corresponding Kolakoski sequence is also not mirror 
invariant: We have that r'" is followed by either or s'" (and by either r'^ or r^) 
while is only followed by (and by r^). E.g., 313331 appears in the Kolakoski 
sequence z = 3331113331313331 . . . while 131113 does not. 

• However, if r and s are both even, the corresponding Kolakoski sequence is reversal 
invariant: One can extend any subword w to right, say uw, such that uw is a palindrome 
(i.e., such that uw =uw) and a subword of the Kolakoski sequence. E.g., in the previous 
example, since 313331 appears, so does 13331313331 and this establishes immediately 
that 133313 appears in the Kolakoski sequence over {1,3}. The reader may convince 
herself that this construction always works (one can start by the observation that 

is preceeded by either s** or s^, while r'^ must be preceeded by s'^). 

Our goal is now to see if we can say something about these properties in the case where 
one of the letters {r, s} is odd while the other is even. Here, we will closely follow [14, Section 
3]. From now on, we will use the following convention: 

r = min{r, s} and s = max{r, s}. 

Let w be a word over A = {r, s}. We define the following ^^differentiation" rule for w: The 
derivative D{w) of w is, in principle, the run-length sequence of w except for (possibly) the 
first and last symbol. If w is a single run of length less than s, we set D{w) = e. If w 
consists of more than one run and the first (last) run of w is of length less than or equal to 
r, we discard this run it. If w consists of more than one run and the first (last) run of w 
has length between r + 1 and s, we extend it to a run of length s. The word D{w) is now 
the run-length sequence of this altered word and might be the empty word e (we use the 
convention D{s) = e). We say that w is differentiable if D{w) is again a word over the same 
alphabet A = {r, s}. Let us look at some examples using the alphabet {2, 5}: 

^(255555222) = 55 D(2555552) = 5 D(2255) = e D(222555) = 55 D(2222) = e 
^(25555552) = 6 D(25252) = 111 £1(222522) = 51 £(2555222) = 35 

Note that the words in the second line are not differentiable! 

The definition of differentiable is chosen such that every subword of a Kolakoski sequence 
is differentiable. In fact, every subword of a Kolakoski sequence is smooth or a C°°-word 
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with respect to this differentiation rule over the respective alphabet, i.e., it is arbitrarily 
often differentiable. 

We say that a word f is a primitive of a word w if D{v) = w. From our differen- 
tial rule (discarding and/or extending the first and last run), one can conclude that each 
(nonempty) word has at least 2r^ and at most primitives (the factor 2 appears since we 
have D{v) = w = D{v), i.e., a word and its mirrored word have the same derivative). E.g., 
over the alphabet {2,3} the primitives of 33 are: 222333, 3222333, 33222333, 2223332, 
22233322, 32223332, 332223332, 322233322, 3322233322, 333222, 2333222, 22333222, 
3332223, 33322233, 23332223, 223332223, 233322233, 2233322233. 

One can now use the differentiation rule to prove the following statements: 

Theorem 2.1 (i) Kolakoski sequences are not eventually periodic (where a sequence z is 
called eventually periodic if there exist m, g G N such that Zi^i . . . Zi^g = -Zj+g+i . . . -2^+2 g 
for all i > m). 

(ii) For a Kolakoski sequence, mirror invariance implies recurrence (where a sequence z is 
called recurrent if any word that occurs in z does so infinitely often). 

(iii) For a Kolakoski sequence, mirror invariance holds iff each C°°-word occurs in it. 
Proof. 

(i) Compare [22] and [12, Example 4]. The reason is that a (minimal) period of length q 
in a sequence z yields a period of length g' < g in its run-length sequence. Thus such 
a sequence z cannot be equal to its run-length sequence. 

(ii) The proof of [14, Proposition 3.1] also applies here. 

(iii) The proof of [13, Proposition 2] also applies here. □ 

We have seen above that in the case where the letters {r, s} are both even or odd, the 
corresponding Kolakoski sequence is not mirror invariant. Of course, since they can be 
constructed using a primitive substitution rule, they are recurrent and even repetitive (or 
uniformly recurrent): Every word that occurs in the sequence does so with bounded gaps. 

However, for all Kolakoski sequence over one even and one odd symbol, nothing seems to 
be known beyond the above implications. We don't know whether or not all C°°-words occur 
in such a Kolakoski sequence, or whether or not it is recurrent. In fact, it is even not known 
whether or not a Kolakoski sequence is repetitive. The problem with the last property is, 
of course, that the gap might be quite large, thus one has to be careful with claims based 
on numerical studies (as in [24, Section 4.1.4]). But one can use C°°-words to answer the 
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following question^: Given a word w, what is the maximal possible length of v such that 
wvw is a C°°-word and w is not a subword of vl For the classical Kolakoski sequence over 
{1, 2} one obtains the following table: 



\w\ 


1 2 


3 


4 


5 


6 


7 


8 9 


10 


11 


maximal \v\ 


2 7 


7 


36 


36 


37 


173 


172 171 


170 


1230 



So, at least all words of length less than 12 must occur with bounded gaps in the classical 
Kolakoski sequence supporting the conjecture that it is repetitive. Note that making this 
observation precise would prove that the Kolakoski sequence is repetitive, because this list 
tells us that there is no C°°-word of length greater than 2 x 11 + 1230 = 1252 such that its 
prefix of length (less than) 11 does not occur again within this word. The jumps in this list 
are closely related to the "degree" that we introduce in the next section. 



3. C°°-words and the "Kolakoski measure' 



We say that a C°° has degree j if 

We call C°°-words of degree 0, i.e., the primitives of the empty word e, fundamental words. 
Note that a fundamental word has length less than max{s, 2r + 1}. 

We now define a function on the cylinder sets [w] of i.e., [w] = [wi...Wn] = 

{Z e \ Zi = Wi, . . . , Zn = Wn}, hj 



/i(H) 



/i {[D^{w)]) ■ if w is a C°°-word of degree j, 

if w is not a C°°-word. 



Here, we have to fix the function /i for all fundamental words, and we do so by requiring that 
= fi{[w]), fi{[w]) = for all fundamental words w and that Xlwe^^/^IM) — ^ 

for 1 < n < max{s, 2r + 1}. For example, one has for the fundamental words 

using ^ = {1,2}: MW) = ^ M[2]) = ^ M[12]) = | MPl]) = | 

using ^ = {2, 3}: M[2]) = ^ M[3]) = ^ M[23]) = | M[32]) = | 
M[22]) = |j Km) = ro M[223]) = | M[332]) = | 
/i([233]) = | M[322]) = | M[2233]) = | M[3322]) = | 

Clearly, one has the property fi{[D{w)]) = (r + s) ■ for any C°°-word of length 

greater than or equal to max{s, 2r + 1}, and one can use this to show: 



^ For the question "Given < n, what is the maximal possible length of w such that wvw is a C°°- 
word?" see [7, Proposition 7]: Based on the computations in [9], this length is bounded 0(n^ ™^), and it 
is conjectured to be 0{n). Also see [11, Section 6.3] and [8] on this question and its connection to Keane's 
question. 
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Theorem 3.1 For any A = {r, s}, the function fi extends to a Borel-measure (also denoted 
fi) on . This measure is mirror invariant, reversal invariant and shift invariant. 



Proof. A careful case study as in [14, Theorem 5.1] also works in the general case. 



□ 



The aim of introducing this measure is to connect it somehow to the frequencies of 
subwords w in a Kolakoski sequence. Indeed, one can show: 

Theorem 3.2 Suppose that z is a Kolakoski sequence over A = {r,s}, where one of the 
numbers r,s is odd and the other even, and that the frequencies f^ = lim ^^^"'^"^^ exist 

and satisfy fw = fw for all words occuring in z. Then for all words w we have fw = fi{[w]). 

Proof. The proof of [14, Proposition 5.1] carries over to the general case, see [29, Proposition 
2.5]. □ 

This is nice result - if only we would know that the frequencies satisfy the required 
properties. In fact, one can state Keane's question for all Kolakoski sequences where one of 
the letters is odd and other one is even: 

In a Kolakoski sequence over {r, s} (where one letter is odd and the other one is 
even), does the frequency of r exist and if so, does it equal |? 



Much computing time has been dedicated to find evidences for or against the conjecture 
that the letter freqeuncy is ^. The numerical evidences against it are usually dismissed by 
looking at larger and larger parts of the Kolakoski sequence, see [32]. 

Since already the existence of the letter frequency is in question, one can try to find 
bounds on limsup„_^oQ \zi ■ ■ ■ Zn\r/n and liminf„_>.oo \zi . . . Zn\r/n using the C°°-words. A 
brute force approach is, of course, to generate all C°°-words of a certain length, say n, and 
check for those with the least number^ a of rs (since for any C°°-word its mirrored version 
w is also a word, the maximal number of rs is — a). One then has^ 



— < liminf 

n n^oo 



\Zi . . . Zn\ 



< - < lim sup 

2 n— >oo 



\Zi . . . Zn\ 



< 



n — a 



For example, one finds the following numbers for alphabets with r + s < 7: 



alphabet 


{1,2} 


{2,3} 


{1,4} 


{3,4} 


{2,5} 


{1,6} 


length n 


1355 


8003 


1131 


1000 


1000 


1000 


a = min \w\r 

\w\=n 


669 


3989 


511 


493 


481 


451 


letter freq. 


0.5±0.0063 


0.5±0.0016 


0.5±0.0482 


0.5±0.007 


0.5±0.019 


0.5±0.049 



^ I.e., we have a — min{|'u;|r | = n and w is a C°°-word} 
^ For a proof see [23, Section 3.2]. 
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Alternatively, one can use a generating function approach, see [23] (based on [26]): For 
each word w on the alphabet {r, s}, one defines the its weight as polynomial x''"''' y''^'" t'"''. By 
summing these weights over all C°°-words^, a lower bound on the frequency is obtained by 
looking at the minimal degree of x for a given power r. The bound I ±^ ^ 0.5 ±0.0223097 
was obtained using this method for the alphabet A= {1, 2}.. 



4. Chvatal's bound on the letter frequency 

Instead of considering C°°-words, Chvatal [9] in his unpublished technical report looked at 
infinite words over {1,2} with the property that their run-length sequence is also a sequence 
over the same alphabet. A sequence over {r, s} is said to be 1-special. If only runs of length 
r and s occur in this sequence, we say that the sequence is 2-special. And if in this run- 
length sequence only runs of length r and s occur, we call the original sequence 3-special. 
We continue in this way and note that a Kolakoski sequence is ci-special for all c? G N. 

We now write (a ci-special) sequence and its d iterated run-length sequences in a special 
way in an array: The first row is the original sequence, the first row its run-length sequence 
and so on, but we align them appropriately in the columns. E.g., for the classical Kolakoski 
sequence (here we use the Kolakoski sequence starting with 1), we write 

12211212212211211221... 
1 2 211 21 2 21 2 21... 
1 2211 21 2... 

1 2 2 11 

We now call the ith element in a the original sequence d-special if the sequence itself is 
rf-special and the ith column in this array has length at least d (there are no blanks in the 
first d lines of this column). We call this column (of length d) the type of the corresponding 
(i-special element in the sequence. E.g., the third letter in the Kolakoski sequence above is 
2-special of type 22, while the 7th letter is 4-special of type 1122 (it is also 2-special of type 
11). 

Now, the observation is that the type of a ci-special element determines the first d terms 
of the type of the previous ci-special element as well as all the letters between them, and 
the type of a ci-special element and the last term of the type of the next ci-special element 
determine the remaining terms of the type of this next ci-special element. These properties 
can be used to iterative build graphs Gd, d > 1. Since the same observations can be made 
about Kolakoski sequence on any alphabet {r, s}, we describe the more general case here. 

The vertices of the graph Gd are the types of the ci-special elements. Since all elements 
of A'^ occur as types, the graph Gd has 2"^ vertices. We connect two vertices u and f by a 

^ In fact, it is computationally more feasible to sum over all words that just avoid to be C°°-words, i.e., 
words on {r, s} that are not C°°-words but any of its (genuine) subwords is. This is the method used in 
[23, 26]. 
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directed edge u v labelled w, if v is the next (i-special type after u in a d-special sequence 
z. If V is the ith element of such a d-special sequence and u the jth element, then the label 
w is the word Zi+iZi^2 ■ ■ ■ ^j-i^j- So if we follow any (infinite) directed path in such a graph 
and read the edge-labels, we get a (i-special sequence. Conversely, any (i-special sequence 
arises as such an infinite path. 

The trick is that one can build the graph Gd+i from the graph Gd'- 

• A path Ar Bis B2S . . . BrS in the graph Gd gives rise to the 

1 A WlW2...Wr J 4 WlW2...Wr „ . ^ 

edges Arr — > B^sr and Ars — )■ Bj.sr m G^+i. 

• A path As — !-)■ BiT B2r . . . B^r in the graph Gd gives rise to the 
edges Asr — )■ B^rr and Ass — > B^rr m Gd+i- 

. B^s ^ B2S ^ ... 

edges Arr — )■ BgSS and Ars — )■ BgSS m G^^+i. 

s — )■ Bir — )■ B2r — )■ . . 
edges Asr — )■ Bgrs and Ass — > B^rs m G^+i. 



A path Ar — !^ Bis i?2S — ^ • • • -B^s in the graph Gd gives rise to the 



A path As Bir i?2r — ^ . . . — ^ -B^r in the graph Gd gives rise to the 



The graphs Gi and G2 are : 




To get bounds on the letter frequencies from a graph Gd, one associates to an edge with edge 
label w the cost — If one now uses for x a number | < x < 1 that is smaller than 

the maximal possible letter frequency that can occur for a (i-special sequence, then one finds 
a negative cycle in this graph. Applying this method to G^ for alphabets with r + s < 7, 
one finds the following bounds: 



alphabet 


{1,2} 


{2,3} 


{1,4} 


{3,4} 


{2,5} 


{1,6} 


upper bound 
letter freq. 


12/ 
/23 

0.5±0.0218 


53/ 
/l05 

0.5±0.0048 


592/ 
/l085 

0.5±0.0457 


0.5±0.0055 


4834/ 

79527 

0.5±0.0075 


1478/ 
/2821 

0.5±0.0240 



By a clever use of the structure of the graphs Gd and efficient programming, Chvatal used 
G22 in [9] which yields the upper bound 616904/1231743 for the classical Kolakoski sequence 
over A = {1,2}, i.e., the letter frequencies are confined to 0.5 ± 0.000838. 
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Icllgtll /i 


number of 
squares 


complexity 


max. 
MOOR 


1 


2 


2 


2 


2 


2 


4 


2 


3 


6 


6 


2% 


9 


12 


42 


2 79 


27 


24 


486 


2 727 



Table 1: The classical Kolakoski case over A = {1, 2}: There are 46 squares and no cubes. 



length n 


number of thereof also 
squares cubes 


complexity 


max. 
MOOR 


1 

4 

6 

10 

15 

25 


2 2 

4 

4 

20 2 
30 

100 16 


2 

8 
14 
30 
58 
130 


3 
2 72 

2 76 
3 

2 7i5 
3% 



Table 2: For A = {2,3}, there are 160 squares and 20 cubes among the C°°-words. 
5. Squares (and Cubes) 

The question which (and how many) squares occur in the classical Kolakoski sequence was 
asked by [27]. Shortly thereafter. Carpi [6, 7] and Lepisto [25] answered the question by 
finding all squares the occur: in the classical Kolakoski sequence (i.e., using the alphabet 
A = {1,2}) only squares of length 1, 2, 3, 9 and 27 ([25, Theorem 1], [6, Proposition 1], [7, 
Proposition 3]) occur; in particular, it is cube-free ([25, Corollary 1],[6, Proposition 2], [7, 
Proposition 4]). Here, a square w of length n is a C°°-word with \w\ = n such that ww is 
also a C°°-word. 

The algorithm for finding squares is based on the following observations: If ww is a 
square, its derivative has the form D{ww) = uvu where \uv\ has to be even (otherwise, not 
WW but WW will be a primitive) and |f | < 1 (we have D{w) = u and v arises because of the 
rule on how to derive first and last runs). There is one speciality, though, if r < |: In these 
cases, V might also be a "negative" power or of length —1, meaning that in uv the 
V cancels the last symbol of u. 

Now if one continues differentiating, one gets a sequence of words D{ww) = uiViUi, 
D^{ww) = U2V2U2, ■ ■ ■ D^iww) = UkVkUk- But one can show that in this sequence the length 
of l^il is bounded by —1 < \vk\ < 2s + 1 where the lower bound —1 only can appear if r < |, 
see [29, Lemma 4.4] (compare to [25, Lemma 1] for A = {1,2}). Furthermore, we must 
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number of 
squares 


thereof also 
cubes 


thereof also 
fourth powers 


complexity 
7(n) 


max. 
MOOR 


1 


2 


2 


2 


2 


4 


2 


4 


2 




4 


3 


5 

8 


10 
6 


8 




20 

36 


3% 
2V4 


13 


4 






96 


2 Vl3 


20 


30 






198 


2 V20 


40 


8 






630 


2 V40 


50 


152 






964 


2 


100 


48 






3124 


2 /50 


116 


364 






4160 


959/ 
^ /116 


134 


400 






5438 


965/ 
^ /134 


174 


8 






8658 


2 yi74 


241 


144 






14694 


920/ 

^ /241 


259 


100 






16588 


2%7 


272 


864 






18358 


9 145/ 
^ /272 


308 


960 






23288 


2 72 


317 


960 






24554 


9 160/ 
^ /317 


353 


1044 






29738 


9 169/ 
^ /353 


408 


4 






38462 


2 


417 


16 






40046 


2 7l39 


453 


28 






46474 


2 7l51 


644 


2072 






82292 


9 177/ 
^ /322 


716 


2252 






100990 


9 375/ 
^ /7I6 


734 


2312 






106410 


9 375/ 
^ /734 


806 


2492 






126570 


9 399/ 
^ /806 


975 


12 






177330 


2 7975 


1065 


12 






208018 


2 7l065 


1529 


4960 






376874 


9 845/ 
^ /l529 


1691 


5404 






451208 


9 929/ 
^ /l691 


1709 


5404 






460688 


9 896/ 
^ /l709 


1745 


5500 






480304 


9 178/ 
^ /349 


1871 


5860 






550730 


9 983/ 
^ /l871 


1925 


12020 






581470 


9 989/ 
^ /l925 


2105 


6508 






684994 


9 1049/ 
^ /2IO5 



Table 3: For A — {1,4}, there are 59 964 squares of which are 12 also cubes and only 2 are 
also fourth powers among the C°°-words. 

always have that \uiVi\ is even for all i. Thus, one now has an algorithm to find squares in 
Kolakoski sequences: We start with all C°°-words of the form uvu where uis & fundamental 
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word, — l<|i;|<2s + l and \uv\ is even (and/or v = e and thus we already have a square 
uu). Then construct all primitives which are again of the form u'v'u' with either \u'v'\ even 
and —1 < \v'\ < 2s + 1, and/or which happen to be a square u'u'. Continue in this way. 
If there are eventually no more words of this form left, the algorithm stops and one has 
calculated all squares among the C°°-words. 

We note, however, that it is a priori not clear whether this algorithm will indeed stop, 
or if there are only finitely many squares in a Kolakoski sequence besides the classical one. 
However, we used this algortihm to check the C°°-words over the alphabets {2, 3} and {1, 4}: 
Both, similar to the classical Kolakoski sequence, have only finitely many squares - there 
are a total of 160 different squares of smooth words over {2, 3} but 59 964 squares of smooth 
words in {1,4}. We list the numbers in these case together with the classical Kolakoski 
sequence in Tables 1-3. We also listed the number of cubes and fourth powers in this cases 
together with the complexity at this word length. The maximal order of repetition, for 
short MOOR, or repetition exponent for a C°°-word w = uv is given by the maximum of 
\ww . . . wu\/\w\ such that ww . . . wu is also a C°°-word. 

Since there are only finitely many squares, the corresponding Kolakoski sequences cannot 
be obtained by a (usual) substitution rule, see [25, Theorem 2] (if a substitution sequence 
has one square, one gets infinitely many using the substitution rule repeatedly). Also, the 
following conjecture was stated in [7]: For any repetition exponent g > 1, the length of 
C°°-words having this exponent is bounded. 



6. Palindromes 

If a C°°-word is a palindrome, i.e., if we have w =w, then D{w) is also a palindrome. 
Conversely, only palindromes of odd length have primitives that are also palindromes. We 
have a look at the following table: 



palindrome 


primitives 


22 


1122, 21122, 11221, 211221, 
2211, 12211, 22112, 122112 


212 


11211, 211211, 112112, 2112112, 
22122, 122122, 221221, 1221221 


121 


121121, 212212 



Thus, together with a further observation, one has in fact an algorithm how to construct 
palindromes, compare [24, Section 4.1.3] (see also [5, 3]): Start with all palindromic funda- 
mental words. Palindromes of odd length where the letter in the middle is odd, will have 
palindromes of odd length among their primitives. Palindromes of odd length where the 
letter in the middle is even, will have palindromes of even length among their primitives. 
Palindromes of even length do not have palindromic primitives. 
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Since w is a palindrome iff w is a palindrome, palindromes (in fact, palindromic funda- 
mental words) of odd length where the letter in the middle is odd play a special role and can 
be used to construct all palindromes. For example by repeatedly constructing primitives, 
one gets the following palindromic two-sided infinite sequence with the number 1 "in the 
middle" when starting if the fundamental word 1 over ^ = {1,2}: 

...122121122 I 1 I 221121221... 

Applying the operation ~ to this word yields: 

...211212211 I 2 I 112212112... 



The primitives of this infinite sequence are: 

. . . 12112212 
. . .21221121 



11 
22 



21221121... 
12112212... 



Consequently, looking at the symmetric part of these sequences, one has 2 palindromes of 
each length (for details see the cited literature). In fact, one always has the single letters 
as fundamental words, there are for all alphabets at least 2 palindromic C°°-words for each 
length. E.g., for A = {2, 3}, the same construction as before works, where we now have 

. . . 22233223322231313222332233222 . . . 



The situation gets a bit more complicated if there is more than one palindromic fundamen- 
tal word of odd length with an odd letter in the middle. E.g., the alphabet A = {1,4} the 
two fundamental words with the stated property are 1 and*^ 111. Thus, additional palin- 
dromes appear (but for each length, one has at most two times the number of palindromic 
fundamental words of odd length with odd letter in its middle): 



length 


palindromes 


1 


1,4 


2 


11, 44 


3 


111, 414, 444, 141 


4 


1111, 4444 


5 


44144, 14141, 11411, 41414 


6 


411114, 144441 


7 


4441444, 1114111 


8 


14111141, 44111144, 41444414, 11444411 



Generalizations of palindromes, namely words of the form wvw ( "palindromes with a gap 
in the middle") have been studied in [16, 17]. 

^ Note that 111 is a single run of length 3 < 4 = s and thus we have Z?(lll) = e. 



13 



7. Complexity 



It is clear that the set of subwords of a Kolakoski sequence is a subset of the C°°-words over 
the same alphabet. Since one, in fact, conjectures that the two sets are even identical, one 
tries to establish bounds on the number of C°°-words for a given length. We denote the 
complexity of C°°-words, i.e., the number of C°°-words of length ra, by 7(n). 

Again, one can straightforwardly generalize results by Dekking. 

Theorem 7.1 Let 'y{n) be the number of C°° -words of length n in the alphabet A = {r, s}. 
Then 

(i) there is an N G N such that 7(72) < where a = '°^2ral for all n > N. 

(ii) there is an N E N and a constant C > such that ^in) > C ■ where d = ^^^V^^l 
for all n > N. 

Proof. For a proof in the classical case A = {1,2}, see [13, Propositions 3 & 4]. For the 
generalizations, see [29, Propositions 4.1 & 4.3]. □ 

For the alphabet A = {1,2} these bounds have recently been improved in [18] (based on 
previous work [33]). In this case, there are positive constants 6*1,6*2 such that 

C*in2-™87<^(^)< ^2^2.7102 

for all e N. 

In fact, one can conjecture: 

There are positive constants C*i, C*2 such that 

Ci-n^ < 7(72) < C*2 ■ n\ where 6 ^^^^ ^ 



In ^ 
111 2 



Noting that for A = {1,2} we have 5 = ln3/ln| ^ 2.7095, we see that this conjecture is 
well supported by the above result. For A = {2,3} and A = {1,4}, we refer to numerical 
results that we show in Fig. 1. 
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